Search CORE

64 research outputs found

A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

Author: Abbasi Hasan
Clune Tom
Klasky Scott
Liu Zhuo
Logan Jeremy
Podhorszki Norbert
Tian Yuan
Wang Bin
Yu Weikuan
Zhou Shujia
Publication venue
Publication date
Field of study

In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR (Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method

NASA Technical Reports Server

Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application

Author: Clune Tom
Cruz Carlos A.
Klasky Scott
Liu Zhuo
Tian Yuan
Wang Bin
Wang Teng
Wang Yandong
Xu Cong
Yu Weikuan
Zhou Shujia
Publication venue
Publication date
Field of study

Exascale computing systems are soon to emerge, which will pose great challenges on the huge gap between computing and I/O performance. Many large-scale scientific applications play an important role in our daily life. The huge amounts of data generated by such applications require highly parallel and efficient I/O management policies. In this paper, we adopt a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems. Through in-detail architectural and experimental characterization, we observe that current legacy I/O schemes incur significant network communication overheads and are unable to fully parallelize the data access, thus degrading applications' I/O performance and scalability. To address these inefficiencies, we redesign its I/O framework along with a set of parallel I/O techniques to achieve high scalability and performance. Evaluation results on the NASA discover cluster show that our optimization of GEOS-5 with ADIOS has led to significant performance improvements compared to the original GEOS-5 implementation

NASA Technical Reports Server

Copyright by

Author: Weikuan Yu
Weikuan Yu
Publication venue
Publication date
Field of study

Advances in CPU and networking technologies make it appealing to aggregate commodity compute nodes into ultra-scale clusters. But the performance achievable is highly depen-dent on how tightly their components are integrated together. The ever-increasing size of clusters and applications running over them leads to dramatic changes in the requirements. These include at least scalable resource management, fault tolerance process control, scalable collective communication, as well as high performance and scalable parallel IO. Message Passing Interface (MPI) is the de facto standard for the development of parallel applications. There are many research efforts actively studying how to leverage the best per-formance of the underlying systems and present to the end applications. In this dissertation, we exploit various modern networking mechanisms from the contemporary interconnects and integrate them into MPI implementations to enhance their performance and scalability. In particular, we have leveraged the novel features available from InfiniBand, Quadrics and Myrinet to provide scalable startup, adaptive connection management, scalable collective operations, as well as high performance parallel IO. We have also designed a parallel Check

CiteSeerX

Enhancing MPI with modern networking mechanisms in cluster interconnects

Author: Yu Weikuan
Publication venue: The Ohio State University / OhioLINK
Publication date: 12/09/2006
Field of study

OhioLINK Electronic Thesis and Dissertation Center

High Performance Broadcast Support in LA-MPI over Quadrics

Author: Weikuan Yu
Publication venue
Publication date
Field of study

LA-MPI is a unique MPI implementation that provides network-level fault-tolerant message passing. This paper describes the efficient implementation of a scalable MPI broadcast algorithm. LA-MPI implements a generic version of the broadcast algorithm using a spanning tree method built on top of point-to-point messaging. However, the Quadrics network, with it’s hardware broadcast support, provides an opportunity for a much more efficient implementation of this collective. We describe the design challenges encountered while making use of the hardware broadcast capability, explore design alternatives and describe the approach taken to design a low-latency, highly scalable, fault-tolerant broadcast algorithm. Our evaluation shows that this implementation reduces broadcast latency and achieves higher scalability relative to the generic version of this operation. In addition, we observe that performance of the implementation is comparable to that of the high performance implementation by QSW [13] for MPICH, and HP’s for Alaska MPI, while providing fault tolerance to network errors not provided by these. 1

CiteSeerX